can't convert cuda tensor to numpy. use tensor.cpu() to copy the tensor to host memory first